PM 566 Assignment 1

Author

Dana Gonzalez

Assignment Details

In this assignment, I will be using Environmental Protection Agency (EPA) air pollution data to determine whether or not daily concentrations of PM2.5 have decreased in California from 2002 to 2022.

Step 1

Read CSV into Dataframe

The data for 2002 includes 15,976 observations (rows) of 22 variables (columns). The 2022 data has the same 22 variables (columns), but instead has 59,756 observations of each (rows).

Data_2002 = read.csv ("~/Desktop/PM 566/PM566-Labs/PM2.5_2002_Data.csv")
Data_2022 = read.csv ("~/Desktop/PM 566/PM566-Labs/PM2.5_2022_Data.csv")

Check Dimensions, Headers, Footers, Variable Names, and Variable Types

Again, this shows that the 2002 and 2022 data sets both have 22 variables (columns), and 15,976 and 59,756 observations (rows) of these variables, respectively.

dim(Data_2002)
[1] 15976    22
dim(Data_2022)
[1] 59756    22

There do not seem to be any obvious or clear irregularities at the top of the data for either year.

head(Data_2002)
        Date Source  Site.ID POC Daily.Mean.PM2.5.Concentration    Units
1 01/05/2002    AQS 60010007   1                           25.1 ug/m3 LC
2 01/06/2002    AQS 60010007   1                           31.6 ug/m3 LC
3 01/08/2002    AQS 60010007   1                           21.4 ug/m3 LC
4 01/11/2002    AQS 60010007   1                           25.9 ug/m3 LC
5 01/14/2002    AQS 60010007   1                           34.5 ug/m3 LC
6 01/17/2002    AQS 60010007   1                           41.0 ug/m3 LC
  Daily.AQI.Value Local.Site.Name Daily.Obs.Count Percent.Complete
1              81       Livermore               1              100
2              93       Livermore               1              100
3              74       Livermore               1              100
4              82       Livermore               1              100
5              98       Livermore               1              100
6             115       Livermore               1              100
  AQS.Parameter.Code AQS.Parameter.Description Method.Code
1              88101  PM2.5 - Local Conditions         120
2              88101  PM2.5 - Local Conditions         120
3              88101  PM2.5 - Local Conditions         120
4              88101  PM2.5 - Local Conditions         120
5              88101  PM2.5 - Local Conditions         120
6              88101  PM2.5 - Local Conditions         120
                     Method.Description CBSA.Code
1 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
2 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
3 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
4 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
5 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
6 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
                          CBSA.Name State.FIPS.Code      State County.FIPS.Code
1 San Francisco-Oakland-Hayward, CA               6 California                1
2 San Francisco-Oakland-Hayward, CA               6 California                1
3 San Francisco-Oakland-Hayward, CA               6 California                1
4 San Francisco-Oakland-Hayward, CA               6 California                1
5 San Francisco-Oakland-Hayward, CA               6 California                1
6 San Francisco-Oakland-Hayward, CA               6 California                1
   County Site.Latitude Site.Longitude
1 Alameda      37.68753      -121.7842
2 Alameda      37.68753      -121.7842
3 Alameda      37.68753      -121.7842
4 Alameda      37.68753      -121.7842
5 Alameda      37.68753      -121.7842
6 Alameda      37.68753      -121.7842
head(Data_2022)
        Date Source  Site.ID POC Daily.Mean.PM2.5.Concentration    Units
1 01/01/2022    AQS 60010007   3                           12.7 ug/m3 LC
2 01/02/2022    AQS 60010007   3                           13.9 ug/m3 LC
3 01/03/2022    AQS 60010007   3                            7.1 ug/m3 LC
4 01/04/2022    AQS 60010007   3                            3.7 ug/m3 LC
5 01/05/2022    AQS 60010007   3                            4.2 ug/m3 LC
6 01/06/2022    AQS 60010007   3                            3.8 ug/m3 LC
  Daily.AQI.Value Local.Site.Name Daily.Obs.Count Percent.Complete
1              58       Livermore               1              100
2              60       Livermore               1              100
3              39       Livermore               1              100
4              21       Livermore               1              100
5              23       Livermore               1              100
6              21       Livermore               1              100
  AQS.Parameter.Code AQS.Parameter.Description Method.Code
1              88101  PM2.5 - Local Conditions         170
2              88101  PM2.5 - Local Conditions         170
3              88101  PM2.5 - Local Conditions         170
4              88101  PM2.5 - Local Conditions         170
5              88101  PM2.5 - Local Conditions         170
6              88101  PM2.5 - Local Conditions         170
                    Method.Description CBSA.Code
1 Met One BAM-1020 Mass Monitor w/VSCC     41860
2 Met One BAM-1020 Mass Monitor w/VSCC     41860
3 Met One BAM-1020 Mass Monitor w/VSCC     41860
4 Met One BAM-1020 Mass Monitor w/VSCC     41860
5 Met One BAM-1020 Mass Monitor w/VSCC     41860
6 Met One BAM-1020 Mass Monitor w/VSCC     41860
                          CBSA.Name State.FIPS.Code      State County.FIPS.Code
1 San Francisco-Oakland-Hayward, CA               6 California                1
2 San Francisco-Oakland-Hayward, CA               6 California                1
3 San Francisco-Oakland-Hayward, CA               6 California                1
4 San Francisco-Oakland-Hayward, CA               6 California                1
5 San Francisco-Oakland-Hayward, CA               6 California                1
6 San Francisco-Oakland-Hayward, CA               6 California                1
   County Site.Latitude Site.Longitude
1 Alameda      37.68753      -121.7842
2 Alameda      37.68753      -121.7842
3 Alameda      37.68753      -121.7842
4 Alameda      37.68753      -121.7842
5 Alameda      37.68753      -121.7842
6 Alameda      37.68753      -121.7842

The same goes for the bottom of the data (although I did have to check to see if Yolo county was real).

tail(Data_2002)
            Date Source  Site.ID POC Daily.Mean.PM2.5.Concentration    Units
15971 12/10/2002    AQS 61131003   1                             15 ug/m3 LC
15972 12/13/2002    AQS 61131003   1                             15 ug/m3 LC
15973 12/22/2002    AQS 61131003   1                              1 ug/m3 LC
15974 12/25/2002    AQS 61131003   1                             23 ug/m3 LC
15975 12/28/2002    AQS 61131003   1                              5 ug/m3 LC
15976 12/31/2002    AQS 61131003   1                              6 ug/m3 LC
      Daily.AQI.Value      Local.Site.Name Daily.Obs.Count Percent.Complete
15971              62 Woodland-Gibson Road               1              100
15972              62 Woodland-Gibson Road               1              100
15973               6 Woodland-Gibson Road               1              100
15974              77 Woodland-Gibson Road               1              100
15975              28 Woodland-Gibson Road               1              100
15976              33 Woodland-Gibson Road               1              100
      AQS.Parameter.Code AQS.Parameter.Description Method.Code
15971              88101  PM2.5 - Local Conditions         117
15972              88101  PM2.5 - Local Conditions         117
15973              88101  PM2.5 - Local Conditions         117
15974              88101  PM2.5 - Local Conditions         117
15975              88101  PM2.5 - Local Conditions         117
15976              88101  PM2.5 - Local Conditions         117
                         Method.Description CBSA.Code
15971 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15972 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15973 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15974 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15975 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15976 R & P Model 2000 PM2.5 Sampler w/WINS     40900
                                    CBSA.Name State.FIPS.Code      State
15971 Sacramento--Roseville--Arden-Arcade, CA               6 California
15972 Sacramento--Roseville--Arden-Arcade, CA               6 California
15973 Sacramento--Roseville--Arden-Arcade, CA               6 California
15974 Sacramento--Roseville--Arden-Arcade, CA               6 California
15975 Sacramento--Roseville--Arden-Arcade, CA               6 California
15976 Sacramento--Roseville--Arden-Arcade, CA               6 California
      County.FIPS.Code County Site.Latitude Site.Longitude
15971              113   Yolo      38.66121      -121.7327
15972              113   Yolo      38.66121      -121.7327
15973              113   Yolo      38.66121      -121.7327
15974              113   Yolo      38.66121      -121.7327
15975              113   Yolo      38.66121      -121.7327
15976              113   Yolo      38.66121      -121.7327
tail(Data_2022)
            Date Source  Site.ID POC Daily.Mean.PM2.5.Concentration    Units
59751 12/01/2022    AQS 61131003   1                            3.4 ug/m3 LC
59752 12/07/2022    AQS 61131003   1                            3.8 ug/m3 LC
59753 12/13/2022    AQS 61131003   1                            6.0 ug/m3 LC
59754 12/19/2022    AQS 61131003   1                           34.8 ug/m3 LC
59755 12/25/2022    AQS 61131003   1                           23.2 ug/m3 LC
59756 12/31/2022    AQS 61131003   1                            1.0 ug/m3 LC
      Daily.AQI.Value      Local.Site.Name Daily.Obs.Count Percent.Complete
59751              19 Woodland-Gibson Road               1              100
59752              21 Woodland-Gibson Road               1              100
59753              33 Woodland-Gibson Road               1              100
59754              99 Woodland-Gibson Road               1              100
59755              77 Woodland-Gibson Road               1              100
59756               6 Woodland-Gibson Road               1              100
      AQS.Parameter.Code AQS.Parameter.Description Method.Code
59751              88101  PM2.5 - Local Conditions         145
59752              88101  PM2.5 - Local Conditions         145
59753              88101  PM2.5 - Local Conditions         145
59754              88101  PM2.5 - Local Conditions         145
59755              88101  PM2.5 - Local Conditions         145
59756              88101  PM2.5 - Local Conditions         145
                                         Method.Description CBSA.Code
59751 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59752 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59753 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59754 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59755 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59756 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
                                    CBSA.Name State.FIPS.Code      State
59751 Sacramento--Roseville--Arden-Arcade, CA               6 California
59752 Sacramento--Roseville--Arden-Arcade, CA               6 California
59753 Sacramento--Roseville--Arden-Arcade, CA               6 California
59754 Sacramento--Roseville--Arden-Arcade, CA               6 California
59755 Sacramento--Roseville--Arden-Arcade, CA               6 California
59756 Sacramento--Roseville--Arden-Arcade, CA               6 California
      County.FIPS.Code County Site.Latitude Site.Longitude
59751              113   Yolo      38.66121      -121.7327
59752              113   Yolo      38.66121      -121.7327
59753              113   Yolo      38.66121      -121.7327
59754              113   Yolo      38.66121      -121.7327
59755              113   Yolo      38.66121      -121.7327
59756              113   Yolo      38.66121      -121.7327

This function allowed us to double check the number of observations and variables for either data set (which matched the outputs for the other functions above). Too, this function allowed us to see more of our data sets’ variable names, variable types, and a few observations for each. Again, there don’t seem to be any clear or obvious irregularities.

str(Data_2002)
'data.frame':   15976 obs. of  22 variables:
 $ Date                          : chr  "01/05/2002" "01/06/2002" "01/08/2002" "01/11/2002" ...
 $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
 $ Site.ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
 $ POC                           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Daily.Mean.PM2.5.Concentration: num  25.1 31.6 21.4 25.9 34.5 41 29.3 15 18.8 37.9 ...
 $ Units                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
 $ Daily.AQI.Value               : int  81 93 74 82 98 115 89 62 69 107 ...
 $ Local.Site.Name               : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
 $ Daily.Obs.Count               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Percent.Complete              : num  100 100 100 100 100 100 100 100 100 100 ...
 $ AQS.Parameter.Code            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
 $ AQS.Parameter.Description     : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
 $ Method.Code                   : int  120 120 120 120 120 120 120 120 120 120 ...
 $ Method.Description            : chr  "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" ...
 $ CBSA.Code                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
 $ CBSA.Name                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
 $ State.FIPS.Code               : int  6 6 6 6 6 6 6 6 6 6 ...
 $ State                         : chr  "California" "California" "California" "California" ...
 $ County.FIPS.Code              : int  1 1 1 1 1 1 1 1 1 1 ...
 $ County                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
 $ Site.Latitude                 : num  37.7 37.7 37.7 37.7 37.7 ...
 $ Site.Longitude                : num  -122 -122 -122 -122 -122 ...
str(Data_2022)
'data.frame':   59756 obs. of  22 variables:
 $ Date                          : chr  "01/01/2022" "01/02/2022" "01/03/2022" "01/04/2022" ...
 $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
 $ Site.ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
 $ POC                           : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Daily.Mean.PM2.5.Concentration: num  12.7 13.9 7.1 3.7 4.2 3.8 2.3 6.9 13.6 11.2 ...
 $ Units                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
 $ Daily.AQI.Value               : int  58 60 39 21 23 21 13 38 59 55 ...
 $ Local.Site.Name               : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
 $ Daily.Obs.Count               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Percent.Complete              : num  100 100 100 100 100 100 100 100 100 100 ...
 $ AQS.Parameter.Code            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
 $ AQS.Parameter.Description     : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
 $ Method.Code                   : int  170 170 170 170 170 170 170 170 170 170 ...
 $ Method.Description            : chr  "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" ...
 $ CBSA.Code                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
 $ CBSA.Name                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
 $ State.FIPS.Code               : int  6 6 6 6 6 6 6 6 6 6 ...
 $ State                         : chr  "California" "California" "California" "California" ...
 $ County.FIPS.Code              : int  1 1 1 1 1 1 1 1 1 1 ...
 $ County                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
 $ Site.Latitude                 : num  37.7 37.7 37.7 37.7 37.7 ...
 $ Site.Longitude                : num  -122 -122 -122 -122 -122 ...

By using the summary function we are able to see various measures of central tendency, measures of spread, and other pieces of information for all 22 of our variables, for each year. Once again, there don’t seem to be any clear or obvious irregularities.

summary(Data_2002)
     Date              Source             Site.ID              POC       
 Length:15976       Length:15976       Min.   :60010007   Min.   :1.000  
 Class :character   Class :character   1st Qu.:60290014   1st Qu.:1.000  
 Mode  :character   Mode  :character   Median :60590007   Median :1.000  
                                       Mean   :60549600   Mean   :1.581  
                                       3rd Qu.:60731002   3rd Qu.:1.000  
                                       Max.   :61131003   Max.   :6.000  
                                                                         
 Daily.Mean.PM2.5.Concentration    Units           Daily.AQI.Value 
 Min.   :  0.00                 Length:15976       Min.   :  0.00  
 1st Qu.:  7.00                 Class :character   1st Qu.: 39.00  
 Median : 12.00                 Mode  :character   Median : 56.00  
 Mean   : 16.12                                    Mean   : 59.28  
 3rd Qu.: 20.50                                    3rd Qu.: 72.00  
 Max.   :104.30                                    Max.   :185.00  
                                                                   
 Local.Site.Name    Daily.Obs.Count Percent.Complete AQS.Parameter.Code
 Length:15976       Min.   :1       Min.   :100      Min.   :88101     
 Class :character   1st Qu.:1       1st Qu.:100      1st Qu.:88101     
 Mode  :character   Median :1       Median :100      Median :88101     
                    Mean   :1       Mean   :100      Mean   :88215     
                    3rd Qu.:1       3rd Qu.:100      3rd Qu.:88502     
                    Max.   :1       Max.   :100      Max.   :88502     
                                                                       
 AQS.Parameter.Description  Method.Code  Method.Description   CBSA.Code    
 Length:15976              Min.   :117   Length:15976       Min.   :12540  
 Class :character          1st Qu.:120   Class :character   1st Qu.:23420  
 Mode  :character          Median :120   Mode  :character   Median :40140  
                           Mean   :297                      Mean   :33270  
                           3rd Qu.:707                      3rd Qu.:41740  
                           Max.   :810                      Max.   :49700  
                                                            NA's   :929    
  CBSA.Name         State.FIPS.Code    State           County.FIPS.Code
 Length:15976       Min.   :6       Length:15976       Min.   :  1.00  
 Class :character   1st Qu.:6       Class :character   1st Qu.: 29.00  
 Mode  :character   Median :6       Mode  :character   Median : 59.00  
                    Mean   :6                          Mean   : 54.78  
                    3rd Qu.:6                          3rd Qu.: 73.00  
                    Max.   :6                          Max.   :113.00  
                                                                       
    County          Site.Latitude   Site.Longitude  
 Length:15976       Min.   :32.63   Min.   :-124.2  
 Class :character   1st Qu.:34.07   1st Qu.:-121.4  
 Mode  :character   Median :35.36   Median :-119.1  
                    Mean   :36.00   Mean   :-119.4  
                    3rd Qu.:37.77   3rd Qu.:-117.9  
                    Max.   :41.71   Max.   :-115.5  
                                                    
summary(Data_2022)
     Date              Source             Site.ID              POC       
 Length:59756       Length:59756       Min.   :60010007   Min.   : 1.00  
 Class :character   Class :character   1st Qu.:60290019   1st Qu.: 1.00  
 Mode  :character   Mode  :character   Median :60631006   Median : 3.00  
                                       Mean   :60563315   Mean   : 3.77  
                                       3rd Qu.:60731026   3rd Qu.: 3.00  
                                       Max.   :61131003   Max.   :24.00  
                                                                         
 Daily.Mean.PM2.5.Concentration    Units           Daily.AQI.Value 
 Min.   : -6.700                Length:59756       Min.   :  0.00  
 1st Qu.:  4.100                Class :character   1st Qu.: 23.00  
 Median :  6.800                Mode  :character   Median : 38.00  
 Mean   :  8.429                                   Mean   : 39.28  
 3rd Qu.: 10.700                                   3rd Qu.: 54.00  
 Max.   :302.500                                   Max.   :454.00  
                                                                   
 Local.Site.Name    Daily.Obs.Count Percent.Complete AQS.Parameter.Code
 Length:59756       Min.   :1       Min.   :100      Min.   :88101     
 Class :character   1st Qu.:1       1st Qu.:100      1st Qu.:88101     
 Mode  :character   Median :1       Median :100      Median :88101     
                    Mean   :1       Mean   :100      Mean   :88192     
                    3rd Qu.:1       3rd Qu.:100      3rd Qu.:88101     
                    Max.   :1       Max.   :100      Max.   :88502     
                                                                       
 AQS.Parameter.Description  Method.Code  Method.Description   CBSA.Code    
 Length:59756              Min.   :143   Length:59756       Min.   :12540  
 Class :character          1st Qu.:170   Class :character   1st Qu.:31080  
 Mode  :character          Median :170   Mode  :character   Median :40140  
                           Mean   :336                      Mean   :34957  
                           3rd Qu.:707                      3rd Qu.:41860  
                           Max.   :810                      Max.   :49700  
                                                            NA's   :4567   
  CBSA.Name         State.FIPS.Code    State           County.FIPS.Code
 Length:59756       Min.   :6       Length:59756       Min.   :  1.00  
 Class :character   1st Qu.:6       Class :character   1st Qu.: 29.00  
 Mode  :character   Median :6       Mode  :character   Median : 63.00  
                    Mean   :6                          Mean   : 56.19  
                    3rd Qu.:6                          3rd Qu.: 73.00  
                    Max.   :6                          Max.   :113.00  
                                                                       
    County          Site.Latitude   Site.Longitude  
 Length:59756       Min.   :32.58   Min.   :-124.2  
 Class :character   1st Qu.:34.07   1st Qu.:-121.4  
 Mode  :character   Median :36.49   Median :-119.6  
                    Mean   :36.24   Mean   :-119.6  
                    3rd Qu.:37.96   3rd Qu.:-117.9  
                    Max.   :41.76   Max.   :-115.5  
                                                    

Step 2

Combine 2002 and 2022 Data Into One Dataframe

Combined_Data <- rbind(Data_2002, Data_2022)

Create New Year Column

Combined_Data$Date <- as.Date(Combined_Data$Date, format = "%m/%d/%Y")
Combined_Data$Year <- format(Combined_Data$Date, "%Y")

Rename Key Variables

names(Combined_Data)[names(Combined_Data) == "Daily.Mean.PM2.5.Concentration"] <- "Daily_PM2.5"
names(Combined_Data)[names(Combined_Data) == "Daily.AQI.Value"] <- "Daily_AQI"

Step 3

Map of Collection Sites

Although the monitoring sites are spread throughout California, they seem to be more concentrated along the coast, as well as in/around major cities (i.e., Los Angeles, San Francisco, San Jose, San Diego). Too, there are relatively very few sites in Southeast California (Eastern regions of San Bernardino, Riverside, and Imperial counties).

sum(is.na(Combined_Data$Year))
[1] 0
str(Combined_Data$Year)
 chr [1:75732] "2002" "2002" "2002" "2002" "2002" "2002" "2002" "2002" ...
Combined_Data$Year <- as.numeric(as.character(Combined_Data$Year))
unique(Combined_Data$Year)
[1] 2002 2022
Combined_Data <- Combined_Data[!is.na(Combined_Data$Year),]
Sites <- (unique(Combined_Data[,c("Site.Latitude","Site.Longitude")]))  
dim(Sites)
[1] 202   2
library(leaflet)

pal <- colorFactor(c("lightblue", 'darkred'), domain = unique(Combined_Data$Year))

leaflet(data = Combined_Data) |> 
  addProviderTiles('CartoDB.Positron') |> 
  addCircles(lat = ~Site.Latitude, lng = ~Site.Longitude,
             opacity = 0.01, fillOpacity = 0.001, radius = 1, color = ~pal(Combined_Data$Year))

Step 4

Based on some quick Google searches, most of these daily PM2.5 values seem plausible. Annual averages for California (specifically, Los Angeles) fall around 9 ug/m3, and daily averages can be as high as 35 ug/m3 for the same areas.

We see values much higher than this in our dataset (upwards of 50-69 ug/m3). Still, these values may still be okay as events like wildfires can drastically impact daily PM2.5 concentration averages (e.g., the 2018 Camp Fire in Sacramento lead to a daily PM2.5 concentration of 263 μg/m3, the highest ever recorded in California).

We also see a number of negative values with our daily PM2.5 concentrations. After some more Google searches, I learned that can occur because of two main circumstances: either there is some issue with a measuring instrument, or a measurement is taking place while the atmosphere is extremely clean (approaching 0μg/m3) and there is some level of measurement noise.

After a quick skim of the data, I’m leaning towards thinking that this data set’s negative values are due to the latter explanation, as the majority of them do not exceed -1.0μg/m3.

There do not seem to be any missing values for our variables of interest.

Step 5

Exploratory Graphs

library(ggplot2)